Goto

Collaborating Authors

 Shanxi Province


MPCFormer: A physics-informed data-driven approach for explainable socially-aware autonomous driving

Hu, Jia, Lian, Zhexi, Yan, Xuerun, Bi, Ruiang, Shen, Dou, Ruan, Yu, Wang, Haoran

arXiv.org Artificial Intelligence

Autonomous Driving (AD) vehicles still struggle to exhibit human - like behavior in highly dynamic and interactive traffic scenarios. The key challenge lies in AD's limited ability to interact with surrounding vehicles, largely due to a lack of understandi ng the underlying mechanisms of social interaction. To address this issue, we introduce MPCFormer, an explainable socially - aware autonomous driving approach with physics - informed and data - driven coupled social interaction dynamics. In this model, the dynam ics are formulated into a discrete space - state representation, which embeds physics priors to enhance modeling explainability. The dynamics coefficients are learned from naturalistic driving data via a Transformer - based encoder - decoder architecture. To the best of our knowledge, MPCFormer is the first approach to explicitly model the dynamics of multi - vehicle social interactions. The learned social interaction dynamics enable the planner to generate manifold, human - like behaviors when interacting with surro unding traffic. By leveraging the MPC framework, the approach mitigates the potential safety risks typically associated with purely learning - based methods. Open - looped evaluation on NGSIM dataset demonstrates that MPCFormer achieves superior social interac tion awareness, yielding the lowest trajectory p red iction errors compared with other state - of - the - art approach. The prediction achieves an ADE as low as 0.86 m over a long prediction horizon of 5 seconds. Close - looped experiments in highly intense interact ion scenarios, where consecutive lane changes are required to exit an off - ramp, further validate the effectiveness of MPCFormer. Results show that MPCFormer achieves the highest planning success rate of 94.67%, improves driving efficiency by 15.75%, and re duces the collision rate from 21.25% to 0.5%, outperforming a frontier Reinforcement Learning (RL) based planner. A. Research motivation During recent years, Autonomous Driving (AD) has demonstrated significant progress within transportation systems [1] [2] . However, AD vehicles still face significant challenges in exhibiting human - like behavior in highly dynamic and interactive traffic scenarios such as off - ramp and unprotected left turns [3] [4] . One critical reason is that AD vehic les lack the understanding of the underlying mechanisms of social interaction between surrounding vehicles.


AI Agent for Source Finding by SoFiA-2 for SKA-SDC2

Zhou, Xingchen, Li, Nan, Jia, Peng, Liu, Yingfeng, Deng, Furen, Shu, Shuanghao, Li, Ying, Cao, Liang, Shan, Huanyuan, Ibitoye, Ayodeji

arXiv.org Artificial Intelligence

Source extraction is crucial in analyzing data from next-generation, large-scale sky surveys in radio bands, such as the Square Kilometre Array (SKA). Several source extraction programs, including SoFiA and Aegean, have been developed to address this challenge. However, finding optimal parameter configurations when applying these programs to real observations is non-trivial. For example, the outcomes of SoFiA intensely depend on several key parameters across its preconditioning, source-finding, and reliability-filtering modules. To address this issue, we propose a framework to automatically optimize these parameters using an AI agent based on a state-of-the-art reinforcement learning (RL) algorithm, i.e., Soft Actor-Critic (SAC). The SKA Science Data Challenge 2 (SDC2) dataset is utilized to assess the feasibility and reliability of this framework. The AI agent interacts with the environment by adjusting parameters based on the feedback from the SDC2 score defined by the SDC2 Team, progressively learning to select parameter sets that yield improved performance. After sufficient training, the AI agent can automatically identify an optimal parameter configuration that outperform the benchmark set by Team SoFiA within only 100 evaluation steps and with reduced time consumption. Our approach could address similar problems requiring complex parameter tuning, beyond radio band surveys and source extraction. Yet, high-quality training sets containing representative observations and catalogs of ground truth are essential.


RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting

Wang, Jieting, Shang, Xiaolei, Li, Feijiang, Peng, Furong

arXiv.org Artificial Intelligence

Time series forecasting relies on predicting future values from historical data, yet most state-of-the-art approaches-including transformer and multilayer perceptron-based models-optimize using Mean Squared Error (MSE), which has two fundamental weaknesses: its point-wise error computation fails to capture temporal relationships, and it does not account for inherent noise in the data. To overcome these limitations, we introduce the Residual-Informed Loss (RI-Loss), a novel objective function based on the Hilbert-Schmidt Independence Criterion (HSIC). RI-Loss explicitly models noise structure by enforcing dependence between the residual sequence and a random time series, enabling more robust, noise-aware representations. Theoretically, we derive the first non-asymptotic HSIC bound with explicit double-sample complexity terms, achieving optimal convergence rates through Bernstein-type concentration inequalities and Rademacher complexity analysis. This provides rigorous guarantees for RI-Loss optimization while precisely quantifying kernel space interactions. Empirically, experiments across eight real-world benchmarks and five leading forecasting models demonstrate improvements in predictive performance, validating the effectiveness of our approach. The code is publicly available at: https://github.com/shang-xl/RI-Loss.


Optimized scheduling of electricity-heat cooperative system considering wind energy consumption and peak shaving and valley filling

Ye, Jin, Wang, Lingmei, Zhang, Shujian, Wu, Haihang

arXiv.org Artificial Intelligence

With the global energy transition and rapid development of renewable energy, the scheduling optimization challenge for combined power-heat systems under new energy integration and multiple uncertainties has become increasingly prominent. Addressing this challenge, this study proposes an intelligent scheduling method based on the improved Dual-Delay Deep Deterministic Policy Gradient (PVTD3) algorithm. System optimization is achieved by introducing a penalty term for grid power purchase variations. Simulation results demonstrate that under three typical scenarios (10%, 20%, and 30% renewable penetration), the PVTD3 algorithm reduces the system's comprehensive cost by 6.93%, 12.68%, and 13.59% respectively compared to the traditional TD3 algorithm. Concurrently, it reduces the average fluctuation amplitude of grid power purchases by 12.8%. Regarding energy storage management, the PVTD3 algorithm reduces the end-time state values of low-temperature thermal storage tanks by 7.67-17.67 units while maintaining high-temperature tanks within the 3.59-4.25 safety operating range. Multi-scenario comparative validation demonstrates that the proposed algorithm not only excels in economic efficiency and grid stability but also exhibits superior sustainable scheduling capabilities in energy storage device management.


Scalable Constrained Policy Optimization for Safe Multi-agent Reinforcement Learning

Neural Information Processing Systems

A challenging problem in seeking to bring multi-agent reinforcement learning (MARL) techniques into real-world applications, such as autonomous driving and drone swarms, is how to control multiple agents safely and cooperatively to accomplish tasks.



Pragmatic Theories Enhance Understanding of Implied Meanings in LLMs

Sato, Takuma, Kawano, Seiya, Yoshino, Koichiro

arXiv.org Artificial Intelligence

The ability to accurately interpret implied meanings plays a crucial role in human communication and language use, and language models are also expected to possess this capability. This study demonstrates that providing language models with pragmatic theories as prompts is an effective in-context learning approach for tasks to understand implied meanings. Specifically, we propose an approach in which an overview of pragmatic theories, such as Gricean pragmatics and Relevance Theory, is presented as a prompt to the language model, guiding it through a step-by-step reasoning process to derive a final interpretation. Experimental results showed that, compared to the baseline, which prompts intermediate reasoning without presenting pragmatic theories (0-shot Chain-of-Thought), our methods enabled language models to achieve up to 9.6\% higher scores on pragmatic reasoning tasks. Furthermore, we show that even without explaining the details of pragmatic theories, merely mentioning their names in the prompt leads to a certain performance improvement (around 1-3%) in larger models compared to the baseline.



Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement

Yang, Zhe, Li, Wenrui, Chen, Hongtao, Wang, Penghong, Xiong, Ruiqin, Fan, Xiaopeng

arXiv.org Artificial Intelligence

Abstract--Multimodal learning aims to improve performance by leveraging data from multiple sources. During joint multi-modal training, due to modality bias, the advantaged modality often dominates backpropagation, leading to imbalanced optimization. Existing methods still face two problems: First, the long-term dominance of the dominant modality weakens representation-output coupling in the late stages of training, resulting in the accumulation of redundant information. Second, previous methods often directly and uniformly adjust the gradients of the advantaged modality, ignoring the semantics and directionality between modalities. T o address these limitations, we propose Adaptive Redundancy Regulation for Balanced Multimodal Information Refinement (RedReg), which is inspired by information bottleneck principle. Specifically, we construct a redundancy phase monitor that uses a joint criterion of effective gain growth rate and redundancy to trigger intervention only when redundancy is high. Furthermore, we design a co-information gating mechanism to estimate the contribution of the current dominant modality based on cross-modal semantics. When the task primarily relies on a single modality, the suppression term is automatically disabled to preserve modality-specific information. Finally, we project the gradient of the dominant modality onto the orthogonal complement of the joint multi-modal gradient subspace and suppress the gradient according to redundancy. Experiments show that our method demonstrates superiority among current major methods in most scenarios. Ablation experiments verify the effectiveness of our method. The code is available at https://github.com/xia-zhe/RedReg.git Index T erms--Multimodal learning, modality imbalance, information bottleneck This work was supported in part by the National Key R&D Program of China (2023YFA1008501) and the National Natural Science Foundation of China (NSFC) under grant 624B2049 and U22B2035. Wenrui Li, Penghong Wang, and Xiaopeng Fan are with the Department of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China, and also with Harbin Institute of Technology Suzhou Research Institute, Suzhou 215104, China. Hongtao Chen is with the School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China (e-mail: ht166chen@163.com).


Uncovering and Mitigating Transient Blindness in Multimodal Model Editing

Han, Xiaoqi, Li, Ru, Yi, Ran, Tan, Hongye, Liang, Zhuomin, Gutiérrez-Basulto, Víctor, Pan, Jeff Z.

arXiv.org Artificial Intelligence

Multimodal Model Editing (MMED) aims to correct erroneous knowledge in multimodal models. Existing evaluation methods, adapted from textual model editing, overstate success by relying on low-similarity or random inputs, obscure overfitting. We propose a comprehensive locality evaluation framework, covering three key dimensions: random-image locality, no-image locality, and consistent-image locality, op-erationalized through seven distinct data types, enabling a detailed and structured analysis of multimodal edits. We introduce De-VQA, a dynamic evaluation for visual question answering, uncovering a phenomenon we term transient blindness, overfitting to edit-similar text while ignoring visuals. Token analysis shows edits disproportionately affect textual tokens. We propose locality-aware adversarial losses to balance cross-modal representations. Empirical results demonstrate that our approach consistently outperforms existing baselines, reducing transient blindness and improving locality by 17% on average.